1 Identification of (mis)classified substances as a consequence of the CLP Minimum classification: application to oral toxicity using data frm eChemPortal.

1.1 Background

Replacement of the dangerous substance directive (DSD) by the CLP regulation prompted major re-classification work to adhere to the new criteria. The old and new classification criteria fur acute toxicity don’t match 100%. In case of ambiguity, the minimum classification concept prompts classification into the CLP category of lower toxicity (see image below for clarification). This may did lead to classifications which are not supported by the experimental data, which can be shown e.g. by the REACH registration data.

Classification boundaries for DSD and CLP and the minimum classification. In this case the criteria for oral acute toxicity. For dermal and inhalative exposure, the concentration ranges where the minimum classification procedure leads to a wrong classification are wider for some category boundaries

Classification boundaries for DSD and CLP and the minimum classification. In this case the criteria for oral acute toxicity. For dermal and inhalative exposure, the concentration ranges where the minimum classification procedure leads to a “wrong” classification are wider for some category boundaries

1.2 Get data with CLP classifications

ECHA provides (unofficial) tabular data of harmonized classifications here in form of an excel file. The substances falling under the minimum classification are marked with an ’*’.

Unfortunately it’s only availabe as an excel file. The excel file contains text in the first 4 rows, the table headers in the next two rows, and the data starting with row 7. Parsing two rows as table headers is not suported by readxl::read_excel (don’t know about xlsx::read.xlsx), but the issue discussion on github provides a working solution.

## New names:
## * `` -> `..1`
## * `` -> `..2`
## * `` -> `..3`
## * `` -> `..4`
## * `` -> `..5`
## * … and 7 more
glimpse(clp_table)
## Observations: 4,249
## Variables: 12
## $ `Index No`                                         <chr> "001-001-00-9…
## $ `International Chemical Identification`            <chr> "hydrogen", "…
## $ `EC No`                                            <chr> "215-605-7", …
## $ `CAS No`                                           <chr> "1333-74-0", …
## $ `Classification_Hazard Class and Category Code(s)` <chr> "Flam. Gas 1\…
## $ `Hazard Statement Code(s)`                         <chr> "H220\r\n\r\n…
## $ `Labelling_Pictogram, Signal Word Code(s)`         <chr> "GHS02\r\nGHS…
## $ `Hazard statement Code(s)`                         <chr> "H220\r\n", "…
## $ `Suppl. Hazard statement Code(s)`                  <chr> NA, NA, NA, N…
## $ `Specific Conc. Limits, M-factors`                 <chr> NA, NA, NA, N…
## $ Notes                                              <chr> "U", NA, NA, …
## $ `ATP inserted/ATP Updated`                         <chr> "CLP00", "CLP…

Several variables look messy. For the purpose of this analysis we only need the hazard class categories, hazard codes and some id of the substances. Dropping anything else and renaming the vars:

clp_table <- clp_table %>%
  transmute(
    substance = `International Chemical Identification`,
    ec = `EC No`,
    cas = `CAS No`,
    clp_class = `Classification_Hazard Class and Category Code(s)`,
    h_code = `Hazard Statement Code(s)`
  )

Further, multiple hazard categories are contained in 1 excel cell. This needs to be tidied up first.

clp_table_tidy <- clp_table %>%
  separate_rows(clp_class, h_code, sep = "\r\n") %>%
  filter(str_length(clp_class) > 0)

clp_table_tidy %>% pull(clp_class) %>% 
  table %>% sort %>% 
  tail(20)
## .
##        Press. Gas       Asp. Tox. 1          Carc. 1A Aquatic Chronic 4 
##               194               198               239               253 
##         STOT SE 3     Skin Corr. 1B       STOT RE 2 *    Acute Tox. 2 * 
##               270               279               297               302 
##      Eye Irrit. 2 Aquatic Chronic 3          Muta. 1B     Skin Irrit. 2 
##               396               408               426               480 
##        Eye Dam. 1 Aquatic Chronic 2          Carc. 1B    Acute Tox. 3 * 
##               526               571               680               693 
## Aquatic Chronic 1      Skin Sens. 1   Aquatic Acute 1    Acute Tox. 4 * 
##              1001              1010              1111              1516
write_csv(x = clp_table_tidy, path = "data/clp_table_atp10_20190214.csv")

The cleaned CLP data (10th ATP) is exported to “data/clp_table_atp10_20190214.csv” for later re-use.

Selecting all observations, where the hazard class category contains acute tox and ends with ’*’.

clp_minimum <- clp_table_tidy %>%
  filter(str_detect(clp_class, "Acute.*\\*$")) # select all observations where the hazard class category contains acute tox and  ends with '*'

clp_minimum %>% 
  nrow
## [1] 2511
clp_minimum %>% 
  pull(substance) %>% 
  unique %>% 
  length
## [1] 1576

2511 minimum classifications distributed over 1576 substances.

1.3 For how many of these substances does available data not support the classification?

1.3.1 Sources of experimental data

  • ECHA dissemination database: For the substances which are registered,cc this would certainly be the best source. Unfortunately, ECHA provides no API (subject to change) and it is a bit unclear whether scraping the webpage is allowed. (ECHA prohibits automated scraping of substantial amounts of data…but what is substantial?)
  • eChemportal: No API, but a batch search that should suffice for this purpose. Contains REACH data, but apparently that’s only the 2017 database dump (details here), which would mean more recent data can’t be found. I checked some registrations from 2018 manually and it seems the data are included until April or May 2018.
  • OECD Toolbox: Contains the data from the 2017 REACH database dump. With considerable amount of details (e.g. test item information in natural language. Need to check this later.
  • LRI Ambit2: link. Also contains the data from the 2017 REACH db. Has an API? and is developed by Ideaconsult (good sign for data accessibility). Need to check this later.
  • IUCLID API with db dump: IUCLID6 provides a REST API. However it is very slow and inconvenient: information is coded, codes are available in the iuclid-data dcumentation, but not readily in machine-readable format. Also study records not directly accessible/searchable, but only via the study record UUIDs which in turn have to be requested first with the dossier UUID etc… not a viable solution for quickly getting the acute tox data. (further info/experience on IUCLID api wanted)
  • EPA chemistry dashboard: Has some acute tox data without much details and without references. Exports apparently only the presence of data, not the values itself? Also, some grave errors where apparent when looking at some examplary tox data, looked like they made some parsing mistake.
  • HSDB via PubChem or directly
  • NIOSH-RTECS via PubChem or directly. Direct access to RTECS probably difficult
  • tbc: there must be more

1.3.2 Conclusion where to get data

eChemPortal or OECD Toolbox seems to be the best solution for now. I’m going to use the eChemPortal for the oral route, and probably another data source later for dermal or inhalative toxicity.

1.4 Use data from eChemPortal to get acute toxicity effect levels

1.4.2 Processing the exported results from eChemPortal

For now, I’ll focus on oral studies only, although the minimum classification probably has the least implications for this route. I’ll deal with inhalation and dermal later, maybe with different methodology.

Each query result from eChemPortal was exported as one csv file. Create one tibble of all csv files and create better variable name.

files <- list.files(path ="data/echemportal/", pattern = "oral.*csv$", full.names = T)

studies_oral <- map(files, read_csv2) %>% bind_rows

studies_oral_clean <- studies_oral %>% 
  transmute(name = `Substance Name`,
            substance_number = `Substance Number`,
            number_type = `Number type`,
            link = `Endpoint Link`,
            route = str_extract(Section, "(?<=Acute toxicity: ).+"),
            reliability = str_extract(Values, "(?<=Reliability: )(\\d)"),
            effect_levels = str_extract(Values, "Effect levels(.|[:space:])*$")
  ) %>% 
  separate_rows(effect_levels, sep = "\\n\\r\\n\\r") %>% 
  mutate(
    descriptor = str_extract(effect_levels, "(?<=Dose descriptor: ).+(?<!\\s)"),
    effect_level = str_extract(effect_levels, "(?<=Effect level: ).+")
  ) %>% 
  select(-effect_levels)

Extract the effect concentrations, their quantifier, and the unit.

studies_oral_clean <- studies_oral_clean %>% 
  mutate(
    effect_level_quantifier = str_extract(effect_level, "^\\D+(?=\\s)"),
    effect_level_dose = as.numeric(str_extract(effect_level, "(\\d+)")),
    unit = str_extract(effect_level, "(?<=\\d\\s)\\D+$")
  )

Keep only those with a suitable dose descriptor.

studies_oral_clean <- studies_oral_clean %>% 
  filter(str_detect(descriptor, "^LD50$|^approximate LD50$"))


studies_oral_clean %>%
  pull(number_type) %>%
  table
## .
## CAS Number  EC Number    Unknown 
##      16403        262       2395

Those with EC number can be converted to CAS numbers or handled via EC number in case they have no CAS-RN. However, there are over 2000 unknowns. What are these?

studies_oral_clean %>% 
  filter(str_detect(number_type, "Unknown"))  %>% 
  select(name, substance_number) %>% 
  head(20)
## # A tibble: 20 x 2
##    name                                                    substance_number
##    <chr>                                                   <chr>           
##  1 (1R,2S,7R,8S)-11-(dichloromethylidene)tricyclo[6.2.1.0… <NA>            
##  2 (1Z)-1-chloroprop-1-ene; 2-chloroprop-1-ene; 2-chlorop… <NA>            
##  3 (2R,4R)-4-(2,3-dimethylbutan-2-yl)-2-methylcyclohexan-… <NA>            
##  4 (2R,4R)-4-(2,3-dimethylbutan-2-yl)-2-methylcyclohexan-… <NA>            
##  5 (2S)-2-{[(benzyloxy)carbonyl]amino}-3,3-dimethylbutano… <NA>            
##  6 (2S)-2-{[(benzyloxy)carbonyl]amino}-3,3-dimethylbutano… <NA>            
##  7 (2Z)-4-{[(9Z)-18-[2,3-bis({[(9Z)-12-hydroxyoctadec-9-e… <NA>            
##  8 (3E)-4-(2,6,6-trimethylcyclohex-1-en-1-yl)but-3-en-2-o… <NA>            
##  9 (3E)-4-(2,6,6-trimethylcyclohex-2-en-1-yl)but-3-en-2-o… <NA>            
## 10 (4E)-3-({3-[3-(2,3-dihydroxypropoxy)-2-hydroxypropoxy]… <NA>            
## 11 (5R)-5-[(1S)-1,2-dihydroxyethyl]-4-hydroxy-3-{[(2R,3R,… <NA>            
## 12 -                                                       <NA>            
## 13 -                                                       <NA>            
## 14 -                                                       <NA>            
## 15 -                                                       <NA>            
## 16 -                                                       <NA>            
## 17 -                                                       <NA>            
## 18 -                                                       <NA>            
## 19 -                                                       <NA>            
## 20 -                                                       <NA>

Those with a name seem to be reaction masses. Those without a name seem to be primarily UVCBs and multi-constituent substances. Both are probably not too important for this analysis. However, they should all have an EC Number and it is unclear to me, why eChemPortal does not transport the substance number data.

Still, trying to clean up a bit…

1.5 Convert EC to CAS

1.5.1 Any substances in the CLP list that have EC, but no CAS?

clp_minimum %>% 
  filter(!is.na(ec) & is.na(cas)) %>% 
  head
## # A tibble: 6 x 5
##   substance                                ec      cas   clp_class   h_code
##   <chr>                                    <chr>   <chr> <chr>       <chr> 
## 1 hydrazine bis(3-carboxy-4-hydroxybenzen… 405-03… <NA>  Acute Tox.… H302  
## 2 sodium((n-butyl)x(ethyl)y-1,5-dihydro)a… 418-72… <NA>  Acute Tox.… H332  
## 3 α-hydroxypoly(methyl-(3-(2,2,6,6-tetram… 404-92… <NA>  Acute Tox.… H312  
## 4 α-hydroxypoly(methyl-(3-(2,2,6,6-tetram… 404-92… <NA>  Acute Tox.… H302  
## 5 reaction mass of: 4-[[bis-(4-fluorophen… 403-25… <NA>  Acute Tox.… H302  
## 6 reaction product of: (2-hydroxy-4-(3-pr… 401-53… <NA>  Acute Tox.… H332

yes, several. This should be fixed.

1.5.2 Convert EC to CAS

The webchem package link to github is very nice for various conversions of chemical identifiers via CTS and PubChem. However EC -> CAS is not directly supported via PubChem or CTS as both don’t support EC as input. A possibility is to use the EC inventory and do a lookup. However the EC inventory export is limited to 20000 entires. For this purpose the list of registered substances should be sufficient. It has currently little over 20000 entries, but if only full registrations are exported, the list is well within ECHAs download limit).

1.5.3 Using ECHAs list of registered substances for the EC to CAS lookup

echa_substances <- read_delim(file = "data/reg-substances-export.csv", delim = "\t", skip = 34)
## Warning: Missing column names filled in: 'X9' [9]
## Parsed with column specification:
## cols(
##   Name = col_character(),
##   `EC / List Number` = col_character(),
##   `Cas Number` = col_character(),
##   `Registration Type` = col_character(),
##   `Submission Type` = col_character(),
##   `Total tonnage Band` = col_character(),
##   `Factsheet URL` = col_character(),
##   `Substance Information Page` = col_character(),
##   X9 = col_logical()
## )
# cleaning 
echa_substances <- echa_substances %>% 
  transmute(name_echa = Name,
            cas_echa = `Cas Number`,
            ec_echa = `EC / List Number`) %>% 
  distinct(ec_echa, .keep_all = TRUE) #needed because echa_substances contains duplicates from        
                                      #joint/individual submissions


# which are identified by EC Number
studies_oral_clean %>% 
  filter(str_detect(number_type, "EC"))
## # A tibble: 262 x 11
##    name  substance_number number_type link  route reliability descriptor
##    <chr> <chr>            <chr>       <chr> <chr> <chr>       <chr>     
##  1 (4E,… 484-460-1        EC Number   http… oral  1           LD50      
##  2 (4E,… 484-460-1        EC Number   http… oral  2           LD50      
##  3 -     449-400-0        EC Number   http… oral  1           LD50      
##  4 -     441-100-8        EC Number   http… oral  1           LD50      
##  5 -     413-110-2        EC Number   http… oral  1           LD50      
##  6 -     452-810-2        EC Number   http… oral  1           LD50      
##  7 -     413-110-2        EC Number   http… oral  2           LD50      
##  8 -     451-540-2        EC Number   http… oral  1           LD50      
##  9 -     443-400-4        EC Number   http… oral  2           LD50      
## 10 -     476-160-4        EC Number   http… oral  1           LD50      
## # … with 252 more rows, and 4 more variables: effect_level <chr>,
## #   effect_level_quantifier <chr>, effect_level_dose <dbl>, unit <chr>
# left_join

studies_oral_ec <- studies_oral_clean %>% 
  filter(str_detect(number_type, "EC")) %>% 
  left_join(echa_substances, by = c("substance_number" = "ec_echa")) %>% 
  mutate(cas_echa = ifelse(cas_echa == "-", NA, cas_echa)) %>% 
  filter(!is.na(cas_echa))

1.5.4 Same for entries without any Number: trying to lookup using ECHAs list of registered substances by the name of the substances (likely to fail)

This is likely to fail, as these are probably substances without a CAS number anyways.

studies_oral_name <- studies_oral_clean %>% 
  filter(str_detect(number_type, "Unknown") |
         (number_type == "EC" & !(name %in% studies_oral_ec$name))
         ) %>% 
  left_join(echa_substances, by = c("name" = "name_echa")) %>% 
  mutate(cas_echa = ifelse(cas_echa == "-", NA, cas_echa))


studies_oral_name %>% 
  filter(is.na(cas_echa) & is.na(ec_echa))
## # A tibble: 2,348 x 13
##    name  substance_number number_type link  route reliability descriptor
##    <chr> <chr>            <chr>       <chr> <chr> <chr>       <chr>     
##  1 (1R,… <NA>             Unknown     http… oral  1           LD50      
##  2 (1Z)… <NA>             Unknown     http… oral  1           LD50      
##  3 (2R,… <NA>             Unknown     http… oral  1           LD50      
##  4 (2R,… <NA>             Unknown     http… oral  1           LD50      
##  5 (2Z)… <NA>             Unknown     http… oral  1           LD50      
##  6 (3E)… <NA>             Unknown     http… oral  1           LD50      
##  7 (3E)… <NA>             Unknown     http… oral  1           LD50      
##  8 (4E)… <NA>             Unknown     http… oral  1           LD50      
##  9 -     <NA>             Unknown     http… oral  1           LD50      
## 10 -     <NA>             Unknown     http… oral  2           LD50      
## # … with 2,338 more rows, and 6 more variables: effect_level <chr>,
## #   effect_level_quantifier <chr>, effect_level_dose <dbl>, unit <chr>,
## #   cas_echa <chr>, ec_echa <chr>

There shouldn’t be any entries witout EC number actually because we’re using ECHA disseminated data here. All entries should have at least an EC number.

studies_oral_name <- 
  studies_oral_name %>% 
  mutate(name = ifelse(name == "-", NA, name)) %>%
  mutate(name = ifelse(str_detect(name, "unknown|unnamed"), NA, name)) %>% 
  filter(!(is.na(cas_echa) & is.na(name)))

1.6 Combine the cleaned lists

I.e. the list with substances that can be identified by CAS + the list with CAS identifiers generated by EC numbers + the list with substances identified by their name.

1.6.1 Step 1: For those with identification by EC-Number & succesful CAS-Number lookup: replace EC by CAS

First replace EC by CAS in studies_oral_ec , then bind this to the part of studies_oral_clean that is not identified by EC-Number

studies_oral_ec <- studies_oral_ec %>% 
  mutate(substance_number = cas_echa, 
         number_type = "CAS Number")

studies_oral_clean <- bind_rows(studies_oral_ec, studies_oral_clean %>%
              filter(number_type != "EC Number")
  )

1.6.2 Step 2: For those with Identification by EC or by CAS after succesful name lookup: replace EC by CAS

First set substance number to CAS in studies_oral_name for all entries that have a CAS-Number, then bind this to the part of studies_oral_clean where number type is not “unknown”.

studies_oral_name <- studies_oral_name %>%
  filter(!is.na(cas_echa)) %>% 
  mutate(substance_number = cas_echa,
         number_type = "CAS Number")
  
  
studies_oral_clean <- bind_rows(studies_oral_name, studies_oral_clean %>%
              filter(number_type != "Unknown")
  ) 

1.6.3 Step 3: For those with no Identification by EC nor by CAS after name lookup: use name getting classification

studies_oral_clean <-
  studies_oral_name %>%
  filter(is.na(cas_echa)) %>%
  mutate(number_type = "use_name") %>%
  bind_rows(studies_oral_clean %>%
              filter(number_type != "Unknown") # redundant with current processing
            )

Adds some 160 entries. Probably not really worth it.

1.7 Try to derive CLP classification based on experimental data

The classification limits for oral toxicity are:

category lower limit upper limit
1 NA 5
2 5 50
3 50 300
4 300 2000
none 2000 NA
studies_oral_cat_derived <- studies_oral_clean %>% 
  #
  # handle not classified
  #
  mutate(category_from_exp = ifelse(is.na(effect_level_quantifier) & effect_level_dose > 2000, "not classified", NA),
         category_from_exp = ifelse(is.na(category_from_exp) & 
                                      str_detect(effect_level_quantifier, ">|ca") & # '=' doest not exist, in ths case the quantifier is na 
                                      effect_level_dose >= 2000, "not classified", category_from_exp)
         ) %>% 
  
  #
  # handle cat. 4
  #
  mutate(category_from_exp = ifelse(is.na(category_from_exp) &
                                    is.na(effect_level_quantifier) &
                                    effect_level_dose >= 300, 4, category_from_exp),
         category_from_exp = ifelse(is.na(category_from_exp) & 
                                      str_detect(effect_level_quantifier, ">|ca") & # '=' doest not exist, in ths case the quantifier is na 
                                      effect_level_dose >= 300, 4, category_from_exp),
         category_from_exp = ifelse(is.na(category_from_exp) & 
                                      str_detect(effect_level_quantifier, "<") &
                                      effect_level_dose > 300, 4, category_from_exp)
         ) %>% 
  #
  # handle cat 3
  #
  mutate(category_from_exp = ifelse(is.na(category_from_exp) & 
                                    is.na(effect_level_quantifier) &
                                    effect_level_dose >= 50, 3, category_from_exp),
         category_from_exp = ifelse(is.na(category_from_exp) & 
                                    str_detect(effect_level_quantifier, ">|ca") & 
                                    effect_level_dose >= 50, 3, category_from_exp),
         category_from_exp = ifelse(is.na(category_from_exp) & 
                                    str_detect(effect_level_quantifier, "<") & 
                                    effect_level_dose > 50, 3, category_from_exp)
         ) %>% 
   #
  # handle cat 2
  #
  mutate(category_from_exp = ifelse(is.na(category_from_exp) & 
                                    is.na(effect_level_quantifier) &
                                    effect_level_dose >= 5, 2, category_from_exp),
         category_from_exp = ifelse(is.na(category_from_exp) & 
                                    str_detect(effect_level_quantifier, ">|ca") & 
                                    effect_level_dose >= 5, 2, category_from_exp),
         category_from_exp = ifelse(is.na(category_from_exp) & 
                                    str_detect(effect_level_quantifier, "<") & 
                                    effect_level_dose > 5, 2, category_from_exp)
         ) %>% 
   #
  # handle cat 1
  #
  mutate(category_from_exp = ifelse(is.na(category_from_exp) & 
                                    is.na(effect_level_quantifier) &
                                    effect_level_dose < 5, 1, category_from_exp),
         category_from_exp = ifelse(is.na(category_from_exp) & 
                                    str_detect(effect_level_quantifier, ">|ca") & 
                                    effect_level_dose < 5, 1, category_from_exp),
         category_from_exp = ifelse(is.na(category_from_exp) & 
                                    str_detect(effect_level_quantifier, "<") & 
                                    effect_level_dose <= 5, 1, category_from_exp)
         )

Make sure that all are classified

studies_oral_cat_derived %>% 
  filter(is.na(category_from_exp)) %>% 
  nrow
## [1] 0
studies_oral_cat_derived %>% 
  group_by(category_from_exp) %>% 
  tally %>% 
  ggplot(aes(x = category_from_exp, y = n)) +
  geom_bar(stat = "identity")

1.8 Combine harmonized classification with classification derived from experimental data

data_oral_combined_cas <- clp_minimum %>% 
  filter(str_detect(h_code, "H30\\d")) %>% # select only acute oral toxicity
  full_join(studies_oral_cat_derived %>% 
               select(name, substance_number, number_type, link, category_from_exp, reliability, effect_level_quantifier, effect_level_dose, unit) %>% 
               filter(number_type == "CAS Number"),
               by = c("cas" = "substance_number")
               )


data_oral_combined_name <- clp_minimum %>% 
  filter(str_detect(h_code, "H30\\d")) %>% # select only acute oral toxicity
  full_join(studies_oral_cat_derived %>% 
               select(name, substance_number, number_type, link, category_from_exp, reliability, effect_level_quantifier, effect_level_dose, unit) %>% 
               filter(number_type == "use_name"),
               by = c("substance" = "name")
               ) %>% 
  filter(!is.na(number_type))

data_oral_combined_name$substance %in% data_oral_combined_cas$substance %>% 
  table
## < table of extent 0 >

Not a single study record of a substance that could not be identified by CAS/EC could be linked to the CLP table by the substance name. Still worth to check, this could be different for the other routes

data_oral_combined_cas %>% 
  filter(is.na(cas) | cas == "-") %>% 
  nrow
## [1] 139

very few left without CAS, going to ignore these.

data_oral_combined <- data_oral_combined_cas %>% 
  filter(!is.na(cas) & cas != "-")

data_oral_combined %>% 
  group_by(cas) %>%
  summarise(has_experimental_data = any(!is.na(category_from_exp))) %>% 
  ggplot(aes(has_experimental_data)) +
  geom_bar()

Good news, most substances have experimental data.

Occasionally several LD50 values are reported within the same study record. E.g. because the LD50 of (diluted) preparations/mixtures is reported as well or because there is a gender difference. The former is a problem for the analysis, the latter not. For the purpose of this analysis it is a viable solution to retain only the single lowest value from each study record.

data_oral_combined <- data_oral_combined %>% 
  filter(unit == "mg/kg bw") %>% 
  group_by(link) %>% # link can be used for grouping 
  filter(effect_level_dose == min(effect_level_dose)) %>% 
  distinct(effect_level_dose, .keep_all = TRUE) %>% 
  ungroup

1.8.1 Visualization of derived category from every study with a LD50 value

data_oral_combined %>% 
  filter(!is.na(clp_class)) %>% #remove all without harmoized classif.
  ggplot(aes(x = factor(clp_class), y = category_from_exp)) +
  geom_jitter()

data_oral_combined %>% 
  filter(!is.na(clp_class)) %>% #remove all without harmonized classif.
  filter(!is.na(category_from_exp) | category_from_exp != "not classified") %>% #remove all studies with no classification from experimental studies
  group_by(cas) %>% 
  summarize(single_category = mean(as.numeric(str_replace(category_from_exp, "not classified", "5")), na.rm = TRUE), #na.rm = TRUE is redundant; not classified treated as numerical value of 5
            clp_class = tail(names(sort(table(clp_class))), 1),
            n_experiment = sum(!is.na(category_from_exp))
            ) %>% 
 ggplot(aes(x = factor(clp_class), y = single_category, size = n_experiment)) +
  geom_point(alpha = 0.5, position = position_jitter())

Already obvious that there is a high amount of substances which can quite reliably be classified to a more strict category than according to the minimum classification. Nearly all of these substances would “jump” 1 category, however there are some currently classified as “Acute Tox 4 *" which would probably jump to cat. 2 (indicated by several experimental studies with a mean LD50 < 3).

Identifiying these candidates for concern by a constructed variable:

data_oral_grouped <- data_oral_combined %>% 
  filter(!is.na(clp_class)) %>% #remove all without harmonized classif.
  filter(!is.na(category_from_exp) | category_from_exp != "not classified") %>% #remove all studies with no classification from experimental studies
  group_by(cas) %>% 
  summarize(single_category = mean(as.numeric(str_replace(category_from_exp, "not classified", "5")), na.rm = TRUE), #na.rm = TRUE is redundant; not classified treated as numerical value of 5
            clp_class = tail(names(sort(table(clp_class))), 1),
            n_experiment = sum(!is.na(category_from_exp))
            ) %>% 
  # concern = 2 for experimental data which deviate > 1 from current classification
  # concern = 1                                     < 1 > 0
  # concern = 0 else
  mutate(numeric_clp_class = as.numeric(str_extract(clp_class, "\\d")), #treat not classified as numerical value 5
         concern = ifelse(single_category < numeric_clp_class - 1, 2, 
                         ifelse(single_category < numeric_clp_class, 1, 0)
                          )
         )

# plot
data_oral_grouped %>% 
  ggplot(aes(x = factor(clp_class), y = single_category, size = n_experiment, color = factor(concern))) +
  geom_point(alpha = 0.4, position = position_jitter())

1.8.2 Closer look at those with concern > 1

Extend the data on single experiments with the derived info from grouping by substances and plot substances of high concern

data_oral_extended <- data_oral_combined %>% 
  inner_join(data_oral_grouped, by = c("cas", "clp_class"))

Aligned dot plot for visualization, amount of studies as dot size:

plot_substances_of_concern <- data_oral_extended %>% 
  filter(concern >= 1) %>% 
  arrange(single_category) %>% 
  ggplot(aes(x = fct_reorder(substance, single_category, min), y = category_from_exp)) + # fct_reorder to order factor from lowest single_category to highest
  geom_count(aes(size = ..n..,
                 colour = as.factor(numeric_clp_class))) + 
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  scale_x_discrete(label = function(x) substring(x, 1, 20)) +
  theme(legend.text=element_text(size=rel(0.8))) + # smaller legend font, to give more space to the plot 
  labs(colour = "original\nclassification",
       size = "amount\nof studies") +
  theme(legend.position = "bottom", legend.box.just = "center") +
  ylab("Category derived\nfrom experimental data") + 
  xlab("Substance")

plot_substances_of_concern

1.8.3 Interactive visualization and comparison with registration dossiers

plot_substances_of_concern_plotly <- data_oral_extended %>% 
  filter(concern >= 1) %>% 
#  arrange(single_category) %>% 
  group_by(substance, category_from_exp) %>% 
  mutate(
    plot_label = paste(paste(ifelse(is.na(effect_level_quantifier), "  ", effect_level_quantifier)
                              , effect_level_dose, unit, "- reliability:", reliability, sep = " "),
                        sep = "", collapse = "\n")
  ) %>% 
  ggplot(aes(x = fct_reorder(substance, single_category, min), y = category_from_exp, colour = as.factor(numeric_clp_class), label = plot_label))  + # fct_reorder to order factor from lowest single_category to highest
  geom_count() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  scale_x_discrete(label = function(x) substring(x, 1, 20)) +
  labs(colour = "original\nclassification",
       size = "amount\nof studies") +
  theme(legend.position = "bottom") +
  ylab("Category derived\nfrom experimental data") + 
  xlab("Substance")
  

ggplotly(plot_substances_of_concern_plotly,
         tooltip = c("label")) %>% # instead of "all"
  layout(legend = list(orientation = "h",
                       y = -.8))  # ToDo: does not show both legends,  problem between ggplot output -> plotly

1.9 Comparing the substances with concern > 1 with registration data

data_oral_extended %>% 
  filter(concern >= 2) %>% 
  select(substance, ec, effect_level_quantifier, effect_level_dose, unit, reliability, numeric_clp_class, category_from_exp, single_category, link) %>% 
  arrange(single_category, substance, reliability)
## # A tibble: 18 x 10
##    substance ec    effect_level_qu… effect_level_do… unit  reliability
##    <chr>     <chr> <chr>                       <dbl> <chr> <chr>      
##  1 1,2-epox… 203-… >                               1 mg/k… 2          
##  2 1,2-epox… 203-… ca.                           900 mg/k… 2          
##  3 peraceti… 201-… >                              30 mg/k… 1          
##  4 peraceti… 201-… <NA>                           36 mg/k… 1          
##  5 peraceti… 201-… <NA>                           78 mg/k… 1          
##  6 peraceti… 201-… <NA>                          271 mg/k… 1          
##  7 peraceti… 201-… <NA>                           14 mg/k… 1          
##  8 peraceti… 201-… <NA>                            5 mg/k… 1          
##  9 peraceti… 201-… >                               7 mg/k… 1          
## 10 peraceti… 201-… <NA>                          183 mg/k… 1          
## 11 peraceti… 201-… <NA>                           93 mg/k… 1          
## 12 peraceti… 201-… <NA>                           63 mg/k… 1          
## 13 peraceti… 201-… <NA>                           43 mg/k… 1          
## 14 peraceti… 201-… >                              22 mg/k… 2          
## 15 peraceti… 201-… <NA>                           17 mg/k… 2          
## 16 peraceti… 201-… <                              67 mg/k… 2          
## 17 peraceti… 201-… <NA>                          152 mg/k… 2          
## 18 peraceti… 201-… <NA>                          239 mg/k… 2          
## # … with 4 more variables: numeric_clp_class <dbl>,
## #   category_from_exp <chr>, single_category <dbl>, link <chr>

Only 2 substances with a numerical difference > 2 between current CLH and classification derived from experimental data. For epoxybutane, only 2 experimental values and the difference in the effective dose is huge! Suspicion: there are some entries where the unit is not the same, thus severe differences in effect levels.

data_oral_extended %>% 
  pull(unit) %>% 
  table
## .
## mg/kg bw 
##      801

Nope. Closer look at epoxybutane and peracetic acid:

1.9.1 1,2-epoxybutane

Interesting case, eChemportal exported 2 studies with same reliability but tremendous difference in the effect level concentration relevant for classification. Looking at the dossier https://echa.europa.eu/registration-dossier/-/registered-dossier/13837//?documentUUID=7f00ec16-d9ee-4ab2-b8a9-35fd844a73e5 shows there are actually 4 studies with RL 1 or 2, although only 2 can be found in eChemPortal (could be because cChemPortal is lagging a bit behind with getting data from ECHA and these studies were added to the ECHA database since the last eChemPortal update). Also the UUIDs seem to have changed.
The reason for the big difference in the effect level concentration is actually mistake in the dossier: LD50 is reported as 1 mg/kg bw, but is 1 g/kg bw (obvious when looking at the freetext field “mortality”). Great start…first substance and already a mistake in the registration dossier and only 2/4 available studies covered by the experimental data set.

1.9.2 peracetic acid

Huge number of studies indicating category 2 & 3, several RL1. UUIDs also don’t match for this substance. Interesting case as well, as there are several LD50 of formulations reported (with consequently considerably higher LD50 values than the neat substance). Filtering out these values by only keeping th ellowest seems to work reasonably well. Overall the classification as Category 2 seems to be very well justified based on the experimental data.

1.10 Comparing some of the substances with concern = 1 to the registration data

data_oral_extended %>% 
  filter(concern == 1) %>% 
  select(substance, ec, effect_level_quantifier, effect_level_dose, unit, reliability, numeric_clp_class, category_from_exp, single_category, link) %>% 
  arrange(single_category, substance, reliability) %>% 
  pull(substance) %>% 
  unique
##  [1] "hydrogen cyanide ...%; hydrocyanic acid ...%"                                           
##  [2] "ethyleneimine; aziridine"                                                               
##  [3] "hexachloroplatinic acid"                                                                
##  [4] "acrylonitrile"                                                                          
##  [5] "dinoseb (ISO); 6-sec-butyl-2,4-dinitrophenol"                                           
##  [6] "potassium fluoride"                                                                     
##  [7] "cadmium chloride"                                                                       
##  [8] "chromium (VI) trioxide"                                                                 
##  [9] "sodium fluoride"                                                                        
## [10] "(2,6-xylyloxy) acetic acid"                                                             
## [11] "1,2,3-trichloropropane"                                                                 
## [12] "2-(7-ethyl-1H-indol-3-yl)ethanol"                                                       
## [13] "2-amino-4,6-dinitrophenol; picramic acid"                                               
## [14] "2-butyryl-3-hydroxy-5-thiocyclohexan-3-yl-cyclohex-2-en-1-one"                          
## [15] "2,3-dihydro-2,2-dimethyl-1H-perimidine"                                                 
## [16] "3-chloropropene; allyl chloride"                                                        
## [17] "3,5-dichloro-2,4-difluorobenzoyl fluoride"                                              
## [18] "4-nitrophenol; p-nitrophenol"                                                           
## [19] "5-amino-6-methyl-1,3-dihydrobenzoimidazol-2-one"                                        
## [20] "barium sulphide"                                                                        
## [21] "chloroprene (stabilised); 2-chlorobuta-1,3-diene (stabilised)"                          
## [22] "cobalt oxide"                                                                           
## [23] "dapsone; 4,4'-diamino diphenyl sulfone"                                                 
## [24] "dicyclohexylamine"                                                                      
## [25] "diethylmethoxyborane"                                                                   
## [26] "thionyl dichloride; thionyl chloride"                                                   
## [27] "ziram (ISO); zinc bis dimethyldithiocarbamate"                                          
## [28] "1,3-diphenylguanidine"                                                                  
## [29] "3-cyano-3,5,5-trimethylcyclohexanone"                                                   
## [30] "caffeine"                                                                               
## [31] "methyl chloroformate"                                                                   
## [32] "bronopol (INN); 2-bromo-2-nitropropane-1,3-diol"                                        
## [33] "1-phenyl-3-pyrazolidone"                                                                
## [34] "disodium sulfide; sodium sulfide"                                                       
## [35] "ethyl chloroformate"                                                                    
## [36] "N,N-bis(3-aminopropyl)methylamine"                                                      
## [37] "nickel sulfate"                                                                         
## [38] "1,3-propanesultone; 1,2-oxathiolane 2,2-dioxide"                                        
## [39] "3a,4,7,7a-tetrahydro-4,7-methanoindene"                                                 
## [40] "cobalt dichloride"                                                                      
## [41] "divanadium pentaoxide; vanadium pentoxide"                                              
## [42] "metam-sodium (ISO); sodium methyldithiocarbamate"                                       
## [43] "acetic anhydride"                                                                       
## [44] "allyl glycidyl ether; allyl 2,3-epoxypropyl ether; prop-2-en-1-yl 2,3-epoxypropyl ether"
## [45] "hydroxylamine  ....% [> 55 % in aqueous solution]"                                      
## [46] "iron (II) sulfate"                                                                      
## [47] "acrylic acid; prop-2-enoic acid"                                                        
## [48] "ethyl acrylate"                                                                         
## [49] "methyl acrylate; methyl propenoate"

Many important substances in this list. looking at some examples.

1.10.1 Hydrogen Cyanide

data_oral_extended %>% 
  filter(concern == 1) %>% 
  filter(str_detect(substance, "(?i)hydrogen\\scyanide")) %>% 
  select(substance, ec, effect_level_quantifier, effect_level_dose, unit, reliability, numeric_clp_class, category_from_exp, single_category, link) %>% 
  arrange(single_category, substance, reliability)
## # A tibble: 2 x 10
##   substance ec    effect_level_qu… effect_level_do… unit  reliability
##   <chr>     <chr> <chr>                       <dbl> <chr> <chr>      
## 1 hydrogen… 200-… >=                              3 mg/k… 2          
## 2 hydrogen… 200-… >=                              2 mg/k… 2          
## # … with 4 more variables: numeric_clp_class <dbl>,
## #   category_from_exp <chr>, single_category <dbl>, link <chr>

There is experimental data from 2 studies. Both values quite clearly indicate cat. 1 instead of 2, the current CLH category. The registation data confirms these values (with a “creative” way of reporting effect concentration). Looks like HCN is currently not appropriately classified in the CLP.

1.10.2 Acrylonitrile

data_oral_extended %>% 
  filter(concern == 1) %>% 
  filter(str_detect(substance, "(?i)acrylonitrile")) %>%
  select(substance, ec, effect_level_quantifier, effect_level_dose, unit, reliability, numeric_clp_class, category_from_exp, single_category, link) %>% 
  arrange(single_category, substance, reliability)
## # A tibble: 2 x 10
##   substance ec    effect_level_qu… effect_level_do… unit  reliability
##   <chr>     <chr> <chr>                       <dbl> <chr> <chr>      
## 1 acryloni… 203-… <NA>                           95 mg/k… 2          
## 2 acryloni… 203-… <NA>                           25 mg/k… 2          
## # … with 4 more variables: numeric_clp_class <dbl>,
## #   category_from_exp <chr>, single_category <dbl>, link <chr>

2 experimental values exported from eChemportal. Same reliability, first study corresponds to cat. 3, second study to cat. 2. These values fit the reported data in the ECHA database. The second value (indicating category 2 instead of 3), in this case corresponds to the lowest value of actually several studies given for this study in the ECHA data (as results from several studies are reported within in the same endpoint study record - certainly not ideal). Further, according to the result description, these results are from studies with different quality and the most reliable value is actually not the 25 mg/kg bw, but a higher one. Thus, category 3 seems justified.

1.10.3 chromium (VI) trioxide

data_oral_extended %>% 
  filter(concern == 1) %>% 
  filter(str_detect(substance, "(?i)chromium.*trioxide")) %>%
  select(substance, ec, effect_level_quantifier, effect_level_dose, unit, reliability, numeric_clp_class, category_from_exp, single_category, link) %>% 
  arrange(single_category, substance, reliability)
## # A tibble: 4 x 10
##   substance ec    effect_level_qu… effect_level_do… unit  reliability
##   <chr>     <chr> <chr>                       <dbl> <chr> <chr>      
## 1 chromium… 215-… <NA>                           67 mg/k… 1          
## 2 chromium… 215-… <NA>                           39 mg/k… 2          
## 3 chromium… 215-… <NA>                           80 mg/k… 2          
## 4 chromium… 215-… <NA>                           52 mg/k… 2          
## # … with 4 more variables: numeric_clp_class <dbl>,
## #   category_from_exp <chr>, single_category <dbl>, link <chr>

1 RL 1 study indicating category 3. 3 RL2 studies with values indicating cat 2/3. Current CLH: category 3. Reporting in the dossier is again not ideal for automated processing: for one study record, the reported effect levels correspond to several chromates, and without processing the natural language it is not possible to determine which is the one for chromium (VI) oxide. Overall, this seems like a false alarm and cat. 3 is justified.

1.10.4 Nickel sulfate

data_oral_extended %>% 
  filter(concern == 1) %>% 
  filter(str_detect(substance, "(?i)nickel\\ssulfate")) %>%
  select(substance, ec, effect_level_quantifier, effect_level_dose, unit, reliability, numeric_clp_class, category_from_exp, single_category, link) %>% 
  arrange(single_category, substance, reliability)
## # A tibble: 2 x 10
##   substance ec    effect_level_qu… effect_level_do… unit  reliability
##   <chr>     <chr> <chr>                       <dbl> <chr> <chr>      
## 1 nickel s… 232-… <NA>                          361 mg/k… 1          
## 2 nickel s… 232-… <NA>                          275 mg/k… 2          
## # … with 4 more variables: numeric_clp_class <dbl>,
## #   category_from_exp <chr>, single_category <dbl>, link <chr>

Two experimental values from eChemportal. Fits the dossier, the first value is from a modern guideline study. The second is also from a quite reliable source and it’s lowest LD50 value is just below the lower limit for cat. 4. CLH cat 4 seems justified, although a conservative approach could lead to 3.

1.10.5 Vanadiumpentoxide

data_oral_extended %>% 
  filter(concern == 1) %>% 
  filter(str_detect(substance, "(?i)vanadium\\spentoxide")) %>%
  select(substance, ec, effect_level_quantifier, effect_level_dose, unit, reliability, numeric_clp_class, category_from_exp, single_category, link) %>% 
  arrange(single_category, substance, reliability)
## # A tibble: 3 x 10
##   substance ec    effect_level_qu… effect_level_do… unit  reliability
##   <chr>     <chr> <chr>                       <dbl> <chr> <chr>      
## 1 divanadi… 215-… <NA>                          466 mg/k… 1          
## 2 divanadi… 215-… <NA>                          221 mg/k… 1          
## 3 divanadi… 215-… <NA>                          658 mg/k… 1          
## # … with 4 more variables: numeric_clp_class <dbl>,
## #   category_from_exp <chr>, single_category <dbl>, link <chr>

I took V2O5 in because at the time of this writeup, a CLH dossier was just submitted by France, so I can directly compare with an elaborated and recent opinion: https://echa.europa.eu/de/registry-of-clh-intentions-until-outcome/-/dislist/details/0b0236e1814e26d1. France proposes oral toxicity cat. 3 (and btw a stricter classification for CMR properties). The CLH report is not yet available. The dossier lists 3 high quality studies, which lowests results were correctly processed in this analysis. They are all from different test materials but from the same lab with the same protocol, so should be directly comparable. The lowest value comes from technical grade V2O5. Certainly this result was chosen as the critical one for classification, so that the CLH appropriately covers diverse qualities of V2O5.

1.10.6 Ethylacrylate

data_oral_extended %>% 
  filter(concern == 1) %>% 
  filter(str_detect(substance, "(?i)^ethyl.*acrylate")) %>%
  select(substance, ec, effect_level_quantifier, effect_level_dose, unit, reliability, numeric_clp_class, category_from_exp, single_category, link) %>% 
  arrange(single_category, substance, reliability)
## # A tibble: 10 x 10
##    substance ec    effect_level_qu… effect_level_do… unit  reliability
##    <chr>     <chr> <chr>                       <dbl> <chr> <chr>      
##  1 ethyl ac… 205-… <NA>                          280 mg/k… 2          
##  2 ethyl ac… 205-… <NA>                         1800 mg/k… 2          
##  3 ethyl ac… 205-… <NA>                         1020 mg/k… 2          
##  4 ethyl ac… 205-… <NA>                          461 mg/k… 2          
##  5 ethyl ac… 205-… >                             184 mg/k… 2          
##  6 ethyl ac… 205-… <NA>                         1800 mg/k… 2          
##  7 ethyl ac… 205-… ca.                           554 mg/k… 2          
##  8 ethyl ac… 205-… <NA>                          900 mg/k… 2          
##  9 ethyl ac… 205-… <NA>                         1120 mg/k… 2          
## 10 ethyl ac… 205-… >                             900 mg/k… 2          
## # … with 4 more variables: numeric_clp_class <dbl>,
## #   category_from_exp <chr>, single_category <dbl>, link <chr>

I wanted to take a closer look at Ethylacrylate because of the high number of reliable studies available. 10 experimental results with RL2 exported from eChemportal. I didn’t compare all results to the dossier, but it looks like all values are correctly exported. However some of the studies are quite old and it some have very little study details reported and sometimes low animal numbers… probably not all of the 10 studies should be considered equal in reliability. Regarding the classification it seems like those lower values are from exactly those studies which seem to have a lower reliability, so the current CLH with cat. 4 is likely appropriate even though 2 studies indicate 3. But this might need a closer look.

2 Conclusion

2.1 Suitability of eChemPortal as a data source for acute toxicity

  • data export from eChemportal is quite labour intensive for larger/untargeted datasets. There is no API available and the amount of returned results is limited, leading to the necessity to create several data queries. Every variable that needs to be exported has to be queried specificially with a target value. This means that for any categorical variable several query blocks have to be constructed, one for each category of interest. Overall, retrieving the data from eChemPortal is possible, but there are hurdles.
  • I did not encounter any mistakes in the data. This is expected, as eChemPortal uses structured IUCLID data from ECHA.
    • However, it seems EC Numbers are missing sometimes. No idea why.
  • The data in eChemPortal is not up to date with the ECHA database. It looks like more recent data than the 2017 IUCLID database dump is found, but registrations or updates after April or May 2018 seem to be missing. For assessing the appropriateness of the minimum classification this is, at most, only a minor drawback.

2.2 Using experimental data to flag inappropriate minimum classifications

  • The core idea of this analysis was to use experimental data from registration dossiers to identify substances where the minimum classification does not fit the actual data available. For the data on oral toxicity this worked reasonably well, but there are exeptions.
  • However, there seems to be quite a number of false positives, as is revealed when looking closer at the data: substances are flagged as having an inappropriate CLH classifiation even though it actually fits the data. The criteria I choose in this analysis (i.e. always selecting the lowest value within on endpoint study record) aim for a conservative result, prefering some false positives over not catching some real positives. Further improving the accuracy is limited by the data structure in the disseminated registration data and would probably involve natural language processing of the free text fields (which probably? (ToDo: check)) are anyways not included in the eChemPortal data.

2.3 Validation of the hits by comparison with other sources

  • I manually checked some of the hits. To avoid manual checking, a comparison with the self classification of the registrants is probably very helpful. Actually N. Neuwirth & J. Püringer from the austrian AUVA used this method to identify inappropriate minimum classifications to begin with (Neuwirth and Püringer 2016). Registrants are obliged to report a “correct” self classification if they have data that does not agree with the minimum classification. I will probably follow up on this later on and compare the two methods or validate the hits.

2.4 List of results

The following list contains all the substances that were flagged as having an inappropriate (minimum) classification, without correcting for the detected false positives.

data_oral_extended %>% 
  filter(concern >= 1) %>%
  select(substance, cas, numeric_clp_class, single_category) %>%   
  group_by(substance, cas) %>% 
  summarize(
    category_minimum = unique(numeric_clp_class),
    category_experimental = floor(unique(single_category))
    ) %>% 
  arrange(category_experimental, substance) %>% 
  kable
substance cas category_minimum category_experimental
ethyleneimine; aziridine 151-56-4 2 1
hydrogen cyanide …%; hydrocyanic acid …% 74-90-8 2 1
1,2-epoxybutane 106-88-7 4 2
acrylonitrile 107-13-1 3 2
cadmium chloride 10108-64-2 3 2
chromium (VI) trioxide 1333-82-0 3 2
dinoseb (ISO); 6-sec-butyl-2,4-dinitrophenol 88-85-7 3 2
hexachloroplatinic acid 16941-12-1 3 2
peracetic acid . . . % 79-21-0 4 2
potassium fluoride 7789-23-3 3 2
sodium fluoride 7681-49-4 3 2
(2,6-xylyloxy) acetic acid 13335-71-2 4 3
1-phenyl-3-pyrazolidone 92-43-3 4 3
1,2,3-trichloropropane 96-18-4 4 3
1,3-diphenylguanidine 102-06-7 4 3
1,3-propanesultone; 1,2-oxathiolane 2,2-dioxide 1120-71-4 4 3
2-(7-ethyl-1H-indol-3-yl)ethanol 41340-36-7 4 3
2-amino-4,6-dinitrophenol; picramic acid 96-91-3 4 3
2-butyryl-3-hydroxy-5-thiocyclohexan-3-yl-cyclohex-2-en-1-one 94723-86-1 4 3
2,3-dihydro-2,2-dimethyl-1H-perimidine 6364-17-6 4 3
3-chloropropene; allyl chloride 107-05-1 4 3
3-cyano-3,5,5-trimethylcyclohexanone 7027-11-4 4 3
3,5-dichloro-2,4-difluorobenzoyl fluoride 101513-70-6 4 3
3a,4,7,7a-tetrahydro-4,7-methanoindene 77-73-6 4 3
4-nitrophenol; p-nitrophenol 100-02-7 4 3
5-amino-6-methyl-1,3-dihydrobenzoimidazol-2-one 67014-36-2 4 3
acetic anhydride 108-24-7 4 3
acrylic acid; prop-2-enoic acid 79-10-7 4 3
allyl glycidyl ether; allyl 2,3-epoxypropyl ether; prop-2-en-1-yl 2,3-epoxypropyl ether 106-92-3 4 3
barium sulphide 21109-95-5 4 3
bronopol (INN); 2-bromo-2-nitropropane-1,3-diol 52-51-7 4 3
caffeine 58-08-2 4 3
chloroprene (stabilised); 2-chlorobuta-1,3-diene (stabilised) 126-99-8 4 3
cobalt dichloride 7646-79-9 4 3
cobalt oxide 1307-96-6 4 3
dapsone; 4,4’-diamino diphenyl sulfone 80-08-0 4 3
dicyclohexylamine 101-83-7 4 3
diethylmethoxyborane 7397-46-8 4 3
disodium sulfide; sodium sulfide 1313-82-2 4 3
divanadium pentaoxide; vanadium pentoxide 1314-62-1 4 3
ethyl acrylate 140-88-5 4 3
ethyl chloroformate 541-41-3 4 3
hydroxylamine ….% [> 55 % in aqueous solution] 7803-49-8 4 3
iron (II) sulfate 7720-78-7 4 3
metam-sodium (ISO); sodium methyldithiocarbamate 137-42-8 4 3
methyl acrylate; methyl propenoate 96-33-3 4 3
methyl chloroformate 79-22-1 4 3
N,N-bis(3-aminopropyl)methylamine 105-83-9 4 3
nickel sulfate 7786-81-4 4 3
thionyl dichloride; thionyl chloride 7719-09-7 4 3
ziram (ISO); zinc bis dimethyldithiocarbamate 137-30-4 4 3

3 Sessioninfo

devtools::session_info()
## ─ Session info ──────────────────────────────────────────────────────────
##  setting  value                       
##  version  R version 3.4.4 (2018-03-15)
##  os       Ubuntu 18.04.1 LTS          
##  system   x86_64, linux-gnu           
##  ui       X11                         
##  language (EN)                        
##  collate  en_US.UTF-8                 
##  ctype    en_US.UTF-8                 
##  tz       Europe/Berlin               
##  date     2019-04-30                  
## 
## ─ Packages ──────────────────────────────────────────────────────────────
##  package     * version date       lib source        
##  assertthat    0.2.0   2017-04-11 [1] CRAN (R 3.4.4)
##  backports     1.1.3   2018-12-14 [1] CRAN (R 3.4.4)
##  broom         0.5.1   2018-12-05 [1] CRAN (R 3.4.4)
##  callr         3.1.1   2018-12-21 [1] CRAN (R 3.4.4)
##  cellranger    1.1.0   2016-07-27 [1] CRAN (R 3.4.4)
##  cli           1.0.1   2018-09-25 [1] CRAN (R 3.4.4)
##  colorspace    1.4-0   2019-01-13 [1] CRAN (R 3.4.4)
##  crayon        1.3.4   2017-09-16 [1] CRAN (R 3.4.4)
##  crosstalk     1.0.0   2016-12-21 [1] CRAN (R 3.4.4)
##  data.table    1.12.0  2019-01-13 [1] CRAN (R 3.4.4)
##  desc          1.2.0   2018-05-01 [1] CRAN (R 3.4.4)
##  devtools      2.0.1   2018-10-26 [1] CRAN (R 3.4.4)
##  digest        0.6.18  2018-10-10 [1] CRAN (R 3.4.4)
##  dplyr       * 0.8.0.1 2019-02-15 [1] CRAN (R 3.4.4)
##  ellipsis      0.1.0   2019-02-19 [1] CRAN (R 3.4.4)
##  evaluate      0.13    2019-02-12 [1] CRAN (R 3.4.4)
##  fansi         0.4.0   2018-10-05 [1] CRAN (R 3.4.4)
##  forcats     * 0.4.0   2019-02-17 [1] CRAN (R 3.4.4)
##  fs            1.2.6   2018-08-23 [1] CRAN (R 3.4.4)
##  generics      0.0.2   2018-11-29 [1] CRAN (R 3.4.4)
##  ggplot2     * 3.1.0   2018-10-25 [1] CRAN (R 3.4.4)
##  glue          1.3.0   2018-07-17 [1] CRAN (R 3.4.4)
##  gtable        0.2.0   2016-02-26 [1] CRAN (R 3.4.4)
##  haven         2.1.0   2019-02-19 [1] CRAN (R 3.4.4)
##  highr         0.7     2018-06-09 [1] CRAN (R 3.4.4)
##  hms           0.4.2   2018-03-10 [1] CRAN (R 3.4.4)
##  htmltools     0.3.6   2017-04-28 [1] CRAN (R 3.4.4)
##  htmlwidgets   1.3     2018-09-30 [1] CRAN (R 3.4.4)
##  httpuv        1.4.5.1 2018-12-18 [1] CRAN (R 3.4.4)
##  httr          1.4.0   2018-12-11 [1] CRAN (R 3.4.4)
##  jsonlite      1.6     2018-12-07 [1] CRAN (R 3.4.4)
##  knitr       * 1.21    2018-12-10 [1] CRAN (R 3.4.4)
##  labeling      0.3     2014-08-23 [1] CRAN (R 3.4.4)
##  later         0.8.0   2019-02-11 [1] CRAN (R 3.4.4)
##  lattice       0.20-35 2017-03-25 [4] CRAN (R 3.4.2)
##  lazyeval      0.2.1   2017-10-29 [1] CRAN (R 3.4.4)
##  lubridate     1.7.4   2018-04-11 [1] CRAN (R 3.4.4)
##  magrittr      1.5     2014-11-22 [1] CRAN (R 3.4.4)
##  memoise       1.1.0   2017-04-21 [1] CRAN (R 3.4.4)
##  mime          0.6     2018-10-05 [1] CRAN (R 3.4.4)
##  modelr        0.1.4   2019-02-18 [1] CRAN (R 3.4.4)
##  munsell       0.5.0   2018-06-12 [1] CRAN (R 3.4.4)
##  nlme          3.1-131 2017-02-06 [4] CRAN (R 3.4.2)
##  pillar        1.3.1   2018-12-15 [1] CRAN (R 3.4.4)
##  pkgbuild      1.0.2   2018-10-16 [1] CRAN (R 3.4.4)
##  pkgconfig     2.0.2   2018-08-16 [1] CRAN (R 3.4.4)
##  pkgload       1.0.2   2018-10-29 [1] CRAN (R 3.4.4)
##  plotly      * 4.8.0   2018-07-20 [1] CRAN (R 3.4.4)
##  plyr          1.8.4   2016-06-08 [1] CRAN (R 3.4.4)
##  prettyunits   1.0.2   2015-07-13 [1] CRAN (R 3.4.4)
##  processx      3.2.1   2018-12-05 [1] CRAN (R 3.4.4)
##  promises      1.0.1   2018-04-13 [1] CRAN (R 3.4.4)
##  ps            1.3.0   2018-12-21 [1] CRAN (R 3.4.4)
##  purrr       * 0.3.1   2019-03-03 [1] CRAN (R 3.4.4)
##  R6            2.4.0   2019-02-14 [1] CRAN (R 3.4.4)
##  Rcpp          1.0.0   2018-11-07 [1] CRAN (R 3.4.4)
##  readr       * 1.3.1   2018-12-21 [1] CRAN (R 3.4.4)
##  readxl      * 1.3.0   2019-02-15 [1] CRAN (R 3.4.4)
##  remotes       2.0.2   2018-10-30 [1] CRAN (R 3.4.4)
##  rlang         0.3.1   2019-01-08 [1] CRAN (R 3.4.4)
##  rmarkdown     1.11    2018-12-08 [1] CRAN (R 3.4.4)
##  rprojroot     1.3-2   2018-01-03 [1] CRAN (R 3.4.4)
##  rstudioapi    0.9.0   2019-01-09 [1] CRAN (R 3.4.4)
##  rvest         0.3.2   2016-06-17 [1] CRAN (R 3.4.4)
##  scales        1.0.0   2018-08-09 [1] CRAN (R 3.4.4)
##  sessioninfo   1.1.1   2018-11-05 [1] CRAN (R 3.4.4)
##  shiny         1.2.0   2018-11-02 [1] CRAN (R 3.4.4)
##  stringi       1.3.1   2019-02-13 [1] CRAN (R 3.4.4)
##  stringr     * 1.4.0   2019-02-10 [1] CRAN (R 3.4.4)
##  testthat      2.0.1   2018-10-13 [1] CRAN (R 3.4.4)
##  tibble      * 2.0.1   2019-01-12 [1] CRAN (R 3.4.4)
##  tidyr       * 0.8.3   2019-03-01 [1] CRAN (R 3.4.4)
##  tidyselect    0.2.5   2018-10-11 [1] CRAN (R 3.4.4)
##  tidyverse   * 1.2.1   2017-11-14 [1] CRAN (R 3.4.4)
##  usethis       1.4.0   2018-08-14 [1] CRAN (R 3.4.4)
##  utf8          1.1.4   2018-05-24 [1] CRAN (R 3.4.4)
##  viridisLite   0.3.0   2018-02-01 [1] CRAN (R 3.4.4)
##  withr         2.1.2   2018-03-15 [1] CRAN (R 3.4.4)
##  xfun          0.5     2019-02-20 [1] CRAN (R 3.4.4)
##  xml2          1.2.0   2018-01-24 [1] CRAN (R 3.4.4)
##  xtable        1.8-3   2018-08-29 [1] CRAN (R 3.4.4)
##  yaml          2.2.0   2018-07-25 [1] CRAN (R 3.4.4)
## 
## [1] /home/marco/R/x86_64-pc-linux-gnu-library/3.4
## [2] /usr/local/lib/R/site-library
## [3] /usr/lib/R/site-library
## [4] /usr/lib/R/library

References

Neuwirth, N, and J Püringer. 2016. “CLP Minimum Classifications Raise Issues Concerning Occupational Safety and Health.” Chemikalienrecht 76: 107–14.